Online Prediction of Molded Part Quality in the Injection Molding Process Using High-Resolution Time Series

Process-data-supported process monitoring in injection molding plays an important role in compensating for disturbances in the process. Until now, scalar process data from machine controls have been used to predict part quality. In this paper, we investigated the feasibility of incorporating time series of sensor measurements directly as features for machine learning models, as a suitable method of improving the online prediction of part quality. We present a comparison of several state-of-the-art algorithms, using extensive and realistic data sets. Our comparison demonstrates that time series data allow significantly better predictions of part quality than scalar data alone. In future studies, and in production-use cases, such time series should be taken into account in online quality prediction for injection molding.


Introduction
Injection molding is one of the most widely used industrial plastic processing methods. Around 110,000 new injection molding machines are put into operation worldwide every year. With an average service life of 10 years, more than 1 million injection molding machines are currently in industrial use. On average, about five to six molds run on each of these machines. As a specific configuration is set for each mold, 5 to 6 million different running injection molding processes can be expected [1].
Despite extensive setting parameters and integrated control systems in standard injection molding machines, the influence of disturbance variables repeatedly causes fluctuations in process stability during ongoing production, and thus, deviations in the quality of the injection-molded parts.
To compensate for the resulting production capacity losses, many approaches to monitoring and optimizing the injection molding process have been developed. Recent approaches have been dedicated to the evaluation of process data, which provide information about the course of the injection molding process: based on this data, disturbing influences on the process can be detected at an early stage, and dampened or compensated for accordingly. The process data come from pressure, temperature, force, and displacement sensors, and are collected during each injection molding cycle. Many studies have shown how partial aspects of the injection molding process can be monitored or optimized based on indices derived from the process data, or modeling concepts based on process data [2][3][4][5][6][7][8].
One approach to detecting process instabilities in the injection molding process is to predict the quality properties of the molded parts. Consistency in the part quality properties For the successful application of online part quality prediction in industrial applications, the accuracy of the predictions-the deviation between predicted and actual value for a specific quality property-must be reliably high. As the relationships between process data and part quality are very complex, machine learning models are employed to automatically learn functions that approximate the actual quality property based on certain sensor data features: their accuracy depends on the choice of features (the type of process data), and on an appropriate mathematical approximation model (the machine learning method). For a reliable impression of the performance of such models and the selected features, it is imperative to test and evaluate them on the basis of extensive data sets that represent a realistic, industrial injection molding process. As mentioned above, the learning process in ML involves random choices: therefore, such approaches have to be compared in repeated experiments, to mitigate the influence of random factors.
To that end, this paper examined and compared several state-of-the-art machine learning algorithms, to assess their prediction performance. To determine the influence of the feature data category on the prediction quality, each combination of model and feature data category was evaluated separately: this meant that, initially, all machine learning models were trained solely on scalar data, as in previous approaches; then, similar evaluations were conducted, but using either the time series only or the combination of both scalar and time series data.
For the experiments, we created three extensive data sets of cycles and parts, containing 1167 samples, 829 samples, and 1332 samples, respectively: thus, we were able to compare the algorithms' performance for different types of produced parts, made of polyamide filled with 30% glass fiber, from two different manufacturers (see Section 2.2). Moreover, in each data set, various process states were artificially induced by manually changing the disturbance variables: thus, the data sets were more representative of various actual industrial production environments and, therefore, the resulting machine learning models were more broadly applicable.
Each data set contained scalar quantities as well as time series; furthermore, for each produced part, two quality properties were recorded: the weight and a geometric dimension of the molded part. It is the goal of this research to make such quality measurements gratuitous in the industrial setting; however, for the experiments, these measurements were used as a gold standard against which the machine learning model predictions were compared.
In the selection of the sensors whose data measurements were included in the study, again we focused on broad applicability: industrially used injection molds are not always equipped with in-mold sensors, for technical or economic reasons, while sensor data for injection pressure and injection flow curves can be made available on all modern injection molding machines; therefore, such sensors were deliberately omitted in this study, in order to make the findings applicable to molds without them.
The three data sets were collected through a novel software, called AVAPS, that allows the real-time query of high-resolution time series in addition to scalar data from standard (not modified for research purposes) injection moldings machines, under industrial conditions: this approach allowed us to export high-resolution (>100 Hz) time series inline from a standard injection molding machine control, without the use of additional hardware, such as measurement amplifiers. AVAPS directly provides the means to run the most successful models in industry, without further hardware or any kind of modification to the machines.
In summary, the three main contributions of this article are: 1. Through a novel comparison, it is shown that high-resolution time series fed directly into ML models-without reducing their information content through the prior formation of indices-are essential features for quality prediction. This is shown by comparing models using only scalar data, only time series data, and a combination of both, as features in state-of-the-art machine learning models.

2.
The presented approaches are feasible for the quality prediction of part weights and geometric dimensions, and achieve high prediction quality only on the basis of the high-resolution injection pressure and injection flow curves from the machine. All the data used are available on modern standard injection molding machines, without the need for in-mold sensors or other additional hardware.

3.
Large-volume experiments were carried out, in which realistic manufacturing conditions were simulated by artificially inducing disturbances. The resulting extensive data sets allowed the validation of the findings, suggesting their generalizability for similar injection molding processes: thus, they can serve as a baseline for future research. The data will be made publicly available with the publication of this study.

Materials and Methods
In the following, Section 2.1 describes the equipment and experimental setup used to generate the data. The general structure of the data sets is explained in Section 2.2. In Sections 2.2.2-2.2.4, the individual data sets are presented, the underlying experimental plans are explained, and the measured part properties are shown.

Experimental Setup
All experiments were carried out in a specially built, fully automatic injection molding measuring cell. The aim was to operate an unmodified injection molding machine in industrial, fully automatic mode, while retrieving, in real time, all process data from the machine control and, at the same time, to be able to measure the part quality characteristics online, without human influence and without varying time delays, over several hundred injection molding cycles.
A conventional injection molding machine Allrounder 520E 1500-800 (manufactured by Arburg GmbH + Co KG, 72290 Loßburg, Germany), with a screw diameter of 45 mm, was used. Two different single-cavity molds, with a hot runner, were mounted on the machine, for the experiments. Both the molds and the part geometries were chosen to represent an industrial application in the experiments. The injection molding machine was equipped with an OPC UA server interface, according to the EUROMAP 63 standard. The machine was integrated into a local IP network. A standalone software tool (AVAPS 1.0, 34131 Kassel, Germany) was programmed to retrieve the static data and time series from the OPC UA interface of the machine control. AVAPS enables the querying of all data from the machine control, which are offered under so-called node IDs in the machine-internal OPC UA server. The data are then stored in a specialized database, and can be exported in suitable data formats or passed internally to a machine learning model for evaluation. All data queries and collections can be performed over any number of cycles, during a fully automated industrial injection molding process. The assignment of the individual data-both time series and static data-to one other and to the respective cycle or molded part, is ensured. The software tool is run on a conventional personal computer or server, and connected to the machine via the IP network. The controller of the machine is configured to provide time series with a sampling time of 6 ms. Following the results given in [15], sample rates of 100 Hz and more lead to very good results for machine learning models part quality prediction. With slower sampling, the results deteriorated in the studied example: for this reason, sampling rates of 100 Hz or more are referred to as high sampling rates in the following, and the sampling time of 6 ms (166 Hz) used in this study was thus within this range.
In order to collect the geometric dimensions of the components, a digital measurement projector was added to the injection molding measurement cell. The measurement projector was a IM-7020 (manufactured by Keyence Cooporation 1-3-14, Higashinakajima, Hihashiyodogawa-ku, Osaka, 533-8555, Japan), which enables optical (contactless) 2D measurement of the geometry of the molded parts. The projector had a maximum measurement deviation of 8 µm. The measurement projector was integrated into the test setup. At each cycle, the handling robot took the molded part out of the mold, and placed it on the object table of the measurement projector. The measurement was triggered automatically by a signal from the robot. After the measurement, the molded part was picked up again by the robot, and fed to a scale.
The scale was an Entris BCE323i-1S precision balance from Sartorius, with a maximum linearity deviation of 2 mg. After gravimetric measurement, the next part was fed to the quality measurement setup. The whole process took place within the cycle time of the injection molding machine. All data-both process data and quality data-were stored together, and assigned to the respective machine cycle in the AVAPS database. Note that the setup-with a robot, measurement projector, and scale-was only added to create extensive quality data that could be used to initially train the model and evaluate against the respective predictions from the machine learning algorithms. When using the quality prognosis on an injection molding process in a real industrial application, this part of the setup can be omitted (provided that the quality predictions are precise enough): predicting part quality with a successfully trained model only requires querying the process data via OPC UA from the injection molding machine control, and processing the data in the trained model.

Data Sets
The main concerns in the evaluation of machine learning models are their ability to generalize (i.e., to be applicable in various different situations), and their dependence on the experimental conditions in which they were learned: therefore, we evaluated the same models on three different data sets that were created under different conditions. For each data set, we recorded a large number of cycles over several days, thus simulating naturally occurring process influences. Additionally, we artificially induced process influencesdifferent setting parameter sets and process influencing factors, as they occurred under realistic process conditions-which led to diverse and therefore more representative data sets.
This approach made it possible to represent, as comprehensively as possible, most of the process states in the data sets that can occur in an industrial injection molding process over several thousand cycles over a long period of time. Training machine learning algorithms on such data tackled the problem of having to perform extensive recordings of such data on "real" industrial processes, which is very demanding due to the high time and technical requirements for 100 % measurement of the molded part quality.
Furthermore, to examine the behavior of machine learning algorithms in different processes, two molds with completely different process settings were used, as well as two types of polyamide granules.

Data Set General Structure
The data sets consisted of the recorded process data from the injection molding machine control, and the measurement data from the quality measuring devices. Both data categories were assigned to the respective machine cycle counter while the experiments were carried out, and were therefore unmistakably assigned to each other and to the respective injection molding cycle and molded part. The process data could again be divided into two categories: on the one hand, they consisted of the scalar data that can usually also be seen in the actual value log of an injection molding machine control (max. injection pressure, switchover injection pressure, melt cushion, injection time, hot runner temperature, and cylinder heating zone temperatures 2-8); on the other hand, they consisted of time series, which were the injection pressure curve and the injection flow curve that were recorded during each cycle. Each of the time series had a high-resolution sample rate of 6 ms, and consisted of 2049 data points. The measurement data consisted of the scalar values from the scale (part weight) and the digital measurement projector (geometric measurements). Fifteen different geometric measurements were collected from each of the two parts. For this study, the measure with the largest variance was used for the evaluation. The selection of geometric measurements is described below specifically for each data set. In the following, the process data are referred to as features, and the quality measurement data as targets.

Housing Part Data Set
For the experiments to generate the first data set, a mold for a housing part with external dimensions of about 99 × 90 × 42 mm (length × width × height) and a part weight of around 59 g (PA6 30GF) was used ( Figure 1). The processed plastic material (Dinalon® B1S25 G30-0288, manufactured by Repol S.L., 12550 Almazora Castellón, Spain) was dried according to the manufacturer's specifications. The housing part consisted of a complex structure and features that were thin (0.5 mm), partially double-walled and ribbed structures. The process conditions were changed in several trials (Table 1) over four days. At the beginning of each test day, several cycles were performed without changing any parameters: this allowed typical start-up states of the machine, up to the stable running states, to be represented in the data. After that, the barrel and hot runner temperatures, the mold temperatures, and the injection flow were varied at different levels. In addition, pause times typical in practice were induced: for this reason, the machine was stopped at the beginning of the respective stage, for the period of the specified pause times. The melt remained in the heated cylinder during this time, and there was no exchange of the melt after the pausing time had elapsed. The machine was then operated in fully automatic mode. The mold temperature was varied on four levels. In Figure 2, the quality properties resulting from the experiment are plotted over the cycles, using the measurement systems described in Section 2.1. The geometric dimension Distance A is shown, together with the part weight. Distance A was the inside diameter at the position indicated in Figure 3. The mean value of Distance A over all 1167 cycles was 84.9372 mm. Although the measured values of Distance A were obviously influenced by the disturbance variables, the variance was 0.000 43 mm 2 . The parts weight reached a mean value of 58.92 g. Here, the variance is 0.0024 g 2 .

Stacking Box Data Set I
The experiments for generating the Stacking Box Data Set I were carried out in the injection molding measuring cell, using a single-cavity mold with a hot runner for a molded part, in the form of a stacking box ( Figure 4). The stacking box had external dimensions of about 160 × 100 × 73 mm (length × width × height), and a part weight of about 113.5 g (PA6-GF30). A polyamide (PA) granulate PA6-GF30 (Ultramid® B3EG6, manufactured by BASF SE, 67056 Ludwigshafen, Germany) was processed. The stacking box was a predominantly thick-walled (2 mm) injection-molded part with two partially freestanding side walls, which could be affected by warpage. The process conditions were changed in several trials (Table 2), by varying the moisture content of the PA granulate, and the mold temperature. The moisture content was varied on six levels (0.05 −-0.18 %). The mold temperature was varied on three levels (70°C, 80°C, and 90°C). The data contained both start-up processes and stable running processes, performed on three different days. The process was influenced both by controlled input parameters (mold temperature) and by disturbances (material moisture content), leading to a large variation in the molded part quality (see Figure 5). In order to create artificial disturbances, the moisture content was varied by prior treatment of the granulate. These values were only recorded to show the variance in the process input, and will not be used as model features. Examination of the resulting measured quality data (see Figure 5) showed the clear influence of the input parameters and the disturbance variables induced in the experimental plan. The part weight reached an average of 115.16 g, and had a variance of 0.7084 g 2 over all 829 cycles. The geometric dimension Distance B (see Figure 6), an outside diameter, had a mean value of 101.55 mm and a variance of 0.0068 mm 2 .

Stacking Box Data Set II
The experiments for the Stacking Box Data Set II were also performed with the mold for the stacking box. A polyamide granulate PA6-GF30 (Repol Dinalon® B1S25 G30-0288) was used. The material was dried according to the manufacturer's specifications. As in the other experiments, at the beginning of each test day, a few cycles were run to represent the machine start-up in the data, before the process was artificially influenced (Table 3). Next, the injection flow and holding pressure were decreased. To simulate a typical process interruption, the machine was paused for 15 min, with the melt staying in the barrel with switched-on barrel heaters, before starting the next batch of cycles. Finally, the barrel and hot runner temperatures were varied. As shown in Figure 7, the observation of the measured quality properties over 1332 cycles again shows the clear influence of the experimental design. The part weight reached a mean value of 113.54 g, with a variance of 0.1929 g 2 . The measurement of the geometric quantity Distance B yielded a mean value of 101.44 mm, with a variance of 0.0116 mm 2 .

Machine Learning Methodology
Regression methods are used to predict numeric values (the target values) based on given features (explanatory variables). For molding quality prediction in injection molding, the process data are considered as features: the scalar features or the elements of the time series data. A measure of product quality data can be used as the target.
Regression models belong to the field of supervised machine learning: this means that a model has learnable parameters, which have to be determined through a training procedure that takes both features and target values for a representative set of instances. The learned model can then be used on data where the target values are unknown, to predict them. For the molding process, this means that, for a number of cycles, the desired quality feature has to be measured. Once the model is learned properly, it can be applied to new products, and can predict the quality feature with a certain precision: if the latter is sufficient, further expansive quality measurements can be omitted and replaced by the regression model's predictions.
Next to their learnable parameters, many models also have to be parameterized by hyperparameters, which are variables that are chosen, rather than learned, by the user, before the learning process begins. Different choices yield different models, and are thus an influence on the resulting predictive power of a model. Usually, a fixed set of these hyperparameters is selected and, for each combination, a model is learned and evaluated. Then, the hyperparameter combination that yields the highest quality is chosen to be run in production.
To evaluate a model, again, feature data with known targets (in-cycle sensor measurements with known quality results) are used. The model is run on the features, and the thus-predicted value is compared to the actual target.

Nested Cross-Validation
The above-described learning process comprises three steps: learning the learnable parameters; selecting the best hyperparameters; and comparing different algorithms against one other. In machine learning, it is well known that for these three tasks, three different (disjoint) sets of data have to be used, to minimize the risk of overfitting the model [16]. The latter means that the model might pick up on patterns, in the data that it is trained on, that are specific artifacts of that data, but that are not generally true: in such cases, the model would yield good predictions on the data that it is trained on, but severely worse predictions on previously unseen data. Thus, the training performance overestimates the actual predictive power in the real-use case. As observed in Section 3, due to the large number of individual values per cycle or iteration, the number of features is very high (compared to the number of elements-cycles-in the data sets), and the risk of overfitting is particularly high.
To counter these effects, we employed the procedure of repeated, nested crossvalidation [16], a procedure which splits the available data (features with targets), at random, into three parts: the training data, for training learnable parameters; the validation set, for selecting the best hyperparameters; and the test data, for evaluating the best-configured model, and for comparing it to other algorithms. Thus, each part has its own subset of the data, and the models are evaluated on data that has neither been used for training nor for hyperparameter optimization (and so the evaluation is similar to the real-use case, where the model also encounters new, unseen data). To avoid random artifacts resulting from choosing one particular split, algorithms are compared on multiple splits.
In our experiments, we repeated each cross-validation five times, and split the data into 10 folds: each time, one of the splits was selected as test data; the remaining nine splits were merged, and again split into a 10-fold cross-validation (hence, nested cross-validation), to compare the hyperparameters and train the models. Thus, for each algorithm and each selected set of features, 50 tests were conducted, using models that had each been optimized on 50 data sets, yielding a total of 2500 experiments per algorithm and feature set.
Consequently, each considered algorithm was evaluated on 50 different subsets of the available data. By the design of the procedure, each instance in a data set was used five times in a test set: thus, every cycle in the data had the same influence on the overall results. The numbers reported in the next section were averaged over these 50 runs.

Feature Selection
The feature categories available for each molding cycle were scalar data and time series data. To determine the influence of the selection of the feature data category on prediction quality, three feature combinations were evaluated separately, and were then compared: the models were computed, based only on the scalar data (s), only on the time series (t), and on the combination of both data categories (st). While the scalar data for a cycle consisted of only 12 individual values, the time series consisted of a total of 4098 values. In the models, where the combination of both data categories, i.e., scalar and time series data, was used as a feature, we had a total feature count of 4110 values per cycle.

Targets
Machine learning models have to be optimized, with respect to exactly one quantity; therefore, in our experiments, a single quality property was used, to learn and compare models. We conducted two series of experiments: one where the target was the weight of the corresponding molded part, and one where a particular geometric dimension of the part served as the target.

Evaluation Measures and Significance
In the comparison of different models, as well as in the comparison of different hyperparametrizations, evaluation measures were needed, that summarized the difference between the actual target and the predicted value over multiple cases into one score. For the optimization of hyperparameters, and during the training process, the mean squared error (MSE) was used, i.e., where N is the number of instances (cycles) in the data set, y ref,i is the actual value (the measured quality of a cycle's product), and y pred,i represents the corresponding predicted values.
In the overall comparison, we used two different measures. The coefficient of determination was used to quantify the explained variance of the data: to that end, it related the mean squared error of the predictions (numerator) to the variance of the data (denominator). Higher values meant better predictions. The mean absolute percentage error (MAPE) quantified the average relative error, i.e., the prediction error, relative to the actually expected value. R 2 was directly dependent on MSE, and therefore punished large differences more than smaller ones. In contrast, MAPE focused on the average relative deviation from the expected value, and was more easily interpretable, as it directly stated by how many percentage points the predictions were off, on average.
When we compared the results of two algorithms, or the same algorithm on two different feature sets, we compared their average performance on 50 different subsets of the data (see above). While the win of one algorithm over another on only one data set could be the result of random artifacts, this became less likely when compared to multiple data sets (here, 50). To quantify this, we followed the suggestion in [17], and used the Wilcoxon signed-rank test [18] to confirm whether the observed differences between two models were significant.

Baselines
When approaching a prediction problem, it is not a priori clear how hard this problem will be-are predictions easy or difficult? To get an impression of the difficulty of a problem, and to get a grasp of the value of investing in complex regression models, it helps to compare them to the results of simple baselines.
In regression, two baselines are common: baseline mean and baseline median, which can be seen as dummy models or naive predictors. Baseline mean simply predicts the mean target of the values seen during training, whereas baseline median does the same with the median of the training targets. Both baselines completely ignore the actual feature data, and always "predict" the same constant value (the mean or median, respectively).
Every regression model that is considered for use in production should significantly outperform both baselines: only a significantly higher prediction quality justifies the effort of training and employing such models. If baseline mean or median models provide high coefficients of determination R 2 , and low errors (MAPE), this indicates targets with low variance, and thus, possibly, tasks where prediction models are inappropriate [19].

Regression Algorithms
In order to investigate the influence of the selection of available features (scalar, time series, scalar, and time series) on the prediction performance of the molded part quality prognosis, five different state-of-the-art and well-established supervised machine learning regression algorithms were chosen. These model approaches were considered to be wellinvestigated, and were therefore suitable for benchmarking performance on different feature sets.
Robust implementations were available in the open-source machine learning software library, Scikit-learn, for the Python programming language. The following five algorithms were used: , which also learns a linear model; however, the feature space is first transformed by a non-linear function-the RBF kernel. The resulting regression model is linear in the transformed space but non-linear in the original space of features.
All these algorithms have different hyperparameters, e.g. the number k of considered nearest neighbors in k Nearest Neighbors, or the heuristic by which a decision tree decides the next cut in the features space partition. Others include various numerical parameters controlling the influence of certain components in algorithms, such as the influence of the regularization in Ridge Regression, or coefficients in the RBF kernel. These hyperparameters have to be selected by the user. To find proper choices, a set of candidates is chosen and evaluated. The best-performing combination is then used in production (see Section 3.1).
Finally, the two baselines (mean and median) are also regression algorithms, albeit simple ones without hyperparameters.

Results
In this section, we present the results of our part quality regression experiments on all three data sets. In particular, we not only compared multiple typical regression algorithms, regarding their predictive power, but we also leveraged different types of features (s, st, or t), and compared the respective regression quality (using MAPE and R 2 ); therefore, we compared results using solely scalar features (s) to those using time series (t) or the combination of time series and scalar features (st).
All the reported values are averaged results from 50 experiments (10-fold crossvalidation, repeated with five different splits). We tested the significance of the differences between those results ("s vs. t" or "s vs. st"): to that end, we followed the suggestion in [17], and used the Wilcoxon signed-rank test [18]. While most differences were confirmed significant, there were some where both types of features yielded comparable results: these cases are reported in italics. Naturally, this always includes the two baselines (mean and median), as they yielded the exact same results, independent of the chosen features. To estimate the variation of prediction performance within the 50 experiments, for each algorithm and feature combination, the standard deviations for R² and MAPE are given in Appendix A.

Housing Part Data Set
The results for the prediction of a parts weight in the Housing Part data set are shown in Table 4. The highest R² were obtained when using k Nearest Neighbors regression. Based on scalar features only, (s), R 2 was 0.660; with time series features only, (t) R 2 was 0.750. The highest R² of 0.777 was achieved with the combination of scalar and time series features (st). The lowest mean average percentage errors (MAPE) were also achieved by using k Nearest Neighbors regression-0.029 % for (s), 0.024 % (t), and 0.023 % for (st): thus, the best value was reached using st. The R 2 for the baselines mean and median were all below zero; however, the MAPEs were not particularly high, at 0.063 % and 0.062 %, respectively. The results for Target Distance A in Table 5 show the highest R 2 results for the Ridge Regression. The highest R 2 was reached with 0.502, and with the lowest MAPE of 0.017 % with the st features. As with molding weight, the R 2 s for the two baseline comparisons were negative, although very low MAPEs were also obtained here.

Stacking Box Data Set I
The results of the model comparison on the Stacking Box Data Set I are shown in Table 6. Compared to the results on the previous data set, the R 2 s of all the models were much higher. Except for the Decision Tree Regression, the highest R 2 s were obtained for the st features. The R 2 s were 0.993 for SRV RBF, 0.993 for K Nearest Neighbors, 0.992 for Ridge Regression, and 0.989 for Linear Regression-very close to each other. The MAPEs were similar: they were also very close to each other at a low level. The results of the baseline mean and median comparison show the MAPEs that were significantly higher than those of the regression models. Table 7 shows the results for the model comparison for the target geometric dimension, Distance B: here, the SVR RBF performed best, with an R 2 0.785 for the st features. The MAPE for the SVR RBF was 0.027 %, about half of the baseline mean and median MAPEs. Table 6. Stacking Box Data Set I: results for Target Weight. The differences in comparisons of st or t vs. s were tested for significance, using the Wilcoxon signed-rank test (α = 0.05). Where the test did not confirm significant differences, the respective st or t values are printed in italic. The highest R 2 and the lowest MAPE are shown in bold for each algorithm. The results for the Stacking Box Data Set II, in Table 8-also performed with the stacking box tool-show similar high R² for part weight, compared to the Stacking Box Data Set I. Decision Tree Regression, K Nearest Neighbors, Linear Regression, Ridge Regression, and SVR RBF all had an R 2 above 0.900 for the st and t features. The highest R 2 of 0.991 was achieved with Ridge Regression and the st features: with a low MAPE of 0.023, it was much more accurate then the baseline models. The lowest errors were achieved across all models with the st features. The results for Distance B for the Stacking Box Data Set II in Table 9 have the highest R 2 s for any geometric target in this work. The st and t features achieved the highest R 2 s. The MAPEs behaved analogously: they were also below those of the baseline models. Table 9. Stacking Box Data Set II: results for Target Distance B. The differences in comparisons of st or t vs. s were tested for significance, using the Wilcoxon signed-rank test (α = 0.05): where the test did not confirm significant differences, the respective st or t values are printed in italic. The highest R 2 and the lowest MAPE are shown in bold for each algorithm.

Housing Part Data Set
The best performing model for predicting the part weight in the Housing Part data set was obtained with the k Nearest Neighbors algorithm, and achieved the highest R 2 , of 0.777, and the lowest MAPE, of 0.023 %, with the feature combination of scalar data and time series (st). For part weight prediction, the R 2 was low: this was due to the low variance of the measured part weights in the data set. The low variance was also reflected in the comparatively low MAPE of 0.062 % of the baseline median algorithm. Despite the large manual variation of the input parameters when performing the experiments, the resulting variance in the part quality was not large enough: the machine-mold combination had yielded a very robust process. While this is good news for the operators, it makes it hard to learn the influences of process parameters; nevertheless, the k Nearest Neighbors algorithm, with its MAPE of 0.023 %, provided significantly better predictions than the baselines. The stability of the predictive performance across all test data set splits was also reflected in the low standard deviations for MAPE (shown in Table A1). For the prediction of the geometric dimension, lower prediction performances were achieved across all algorithms. Although this tendency can also be observed in other studies [9], the particularly low level of R 2 was again due to the low variance of the target in the data set (see baseline MAPE).
In summary, for this data set, it can be deduced that the best prediction results were achieved based on the combination of time series and scalar data. As the second-best results were achieved with the time series alone, it can be concluded that the time series were highly relevant.

Stacking Box Data Set I
For the Stacking Box Data Set I, significantly higher R 2 s were achieved for the prediction of the target part weight: this was due to the significantly higher variance of the measured target data, and was also represented by the high MAPE of the baseline. The highest R 2 for the individual algorithms were above 0.980: except for one outlier (Decision Tree Regression), these were achieved by the feature combination of time series and scalar data. The algorithm best suited for this prediction task was, again, the K Nearest Neighbors algorithm, with an R 2 of 0.993 and a MAPE of 0.036 %. The low standard deviations for R 2 , of 0.0027, and MAPE, of 0.000 50 %, across all test data set splits, confirm the stable prediction performance of the K Nearest Neighbors algorithm (see Table A3). The SVM RBF achieved the same R 2 , but had a higher MAPE, of 0.047 %. For the prediction of the feature geometric dimension, again-due to lower variance and the baseline MAPE being one power of ten lower-a lower overall level of R 2 was achieved. The SVR RBF achieved the highest R 2 , of 0.785, with a MAPE of 0.027 % based on the combination of time series and scalar data. The same low MAPE was achieved by the K Nearest Neighbors algorithm, but with slightly lower R 2 , of 0.780. Looking at the standard deviations in Table A4, it can be seen that the SVR-RBF algorithm achieved more stable predictions in this case.
Similar to the Housing Part data set, it can be summarized for this data set that the best prediction performance was achieved by a combination of time series and scalar data as features.

Stacking Box Data Set II
Although the induced disturbances in the experimental design for the Stacking Box Data Set II were completely different compared to I, similar high R² for the prediction of part weight were achieved. The higher R 2 for the feature combination of time series and scalar data across all algorithms were also clear for this data set. The results of the analysis for the geometrical target with slightly higher variance and baseline MAPE, show the highest R 2 for geometrical targets in this investigation: they were achieved with the feature combination. For k Nearest Neighbors, even the features time series alone showed a slightly higher R 2 , but the MAPE were the same.

Further Discussion
In the presented comparison, the best prediction models, with the highest coefficients of determination across all algorithms, were archived for the feature combination of time series and scalar data for the Target Weight in the Stacking Box I and II data sets. In addition to the low MAPE for these algorithms, the standard deviations of the MAPE were also very low, which indicated stable learning processes and reliable prediction performance results. For the Target Distance B, the highest coefficients of determination across all the algorithms were lower. The relatively large differences in prediction performance between the two quality attributes can be explained by the lower variance in the measured quality data of the Distance B target; however, the MAPEs of the best models for Target Distance B in the Stacking Box I and II data sets were still very low and, depending on the manufacturing tolerances, did not exclude an application for process monitoring in practice. The same observation can be made for the Housing Part data set: here, the measured quality data varied even less, and did not allow for better coefficients of determination for both target categories.
To generate well-performing prediction models, it is important to use representative training data: we saw this tendency clearly when comparing the results from the Housing Part data set to those from the Stacking Box I and II data sets. It can be assumed that the lower the variance of the targets (with still-varying feature data), the more difficult the modeling task is for the algorithms. It can be concluded that the nature of the data sets, especially the variance of the measured quality data, has a strong influence on the prediction performance characteristics. It is important to test new approaches on different large data sets with realistic process influences. In summary, however, it can be said that even if the prediction performance characteristics of the presented algorithms are low, the quality in the real process also varies little, i.e., in the context of process monitoring for injection molding, small deviations from product specifications are to be expected.
The experiments confirm once more that, by including time series features, the performance of ML prediction algorithms can be improved.

Conclusions
The evaluation of the extensive data sets shows that time series and the combination of time series and scalar data as features allow significantly higher coefficients of determination and lower errors, i.e. better prediction models.
The time series were used directly as features for the models, without reducing their information content through prior formation of indices. The results show that the inclusion of injection pressure curve and injection flow curve, as features for molding quality prediction in injection molding, produces a significant improvement in prediction quality: therefore, high-resolution time series should be considered directly as features in future process monitoring methods based on the prediction of molding quality models.
The time series contain more information about the process dynamics (e.g., effect of disturbances) than do aggregated scalar values. In our experiments, we demonstrated that standard ML algorithms are able to utilize this additional information for the benefit of the resulting prediction accuracy. Moreover, we can be confident that the increase in accuracy from including time series is a general tendency, as it showed repeatedly in different settings (different parts, quality measures, and process variations).
The price for the higher prediction accuracy is the inclusion of significantly more features: in our example, the feature count went up from 12 scalar features to 4089 different features. A large number of features means high complexity and many degrees of freedom in the learning process of machine learning algorithms: this, in turn, is known to cause these algorithms to overfit. It is therefore imperative to evaluate such settings carefully: in this work, we met this challenge by training with extensive experimental data and state-ofthe-art validation methods, applying nested cross-validation followed by significance tests. We saw stable results in all comparisons, with either comparable results or (in most cases) significant improvements when including the full time series.
Furthermore, we demonstrated that a comparably high prediction performance can be achieved without sensor data from the mold: this suggests that the relevant information required for the models is already contained in the injection pressure and injection flow curves from the machine's internal sensors. This would enable the use of process databased quality prediction models in practice, for injection molding processes without inmold sensors. Of course, the mandatory effort for creating a training data set (collecting quality data) for each new injection molding process is not reduced; however, when using prediction algorithms, such measurements have to be taken only for a set of training instances, instead of every produced item.
In future research, the presented approach will be extended by applying machine learning methods that can be specifically adapted for the interpretation of time series as features using Convolutional Neural Networks (CNN): these may be even better suited to extracting relevant information from the data. CNN can be used to make the information contained in the dynamic context of the individual values within a time series accessible for the model. The data sets generated and the results presented in this paper will serve as the basis for this future work. The data sets will be made publicly available with the publication of this paper, and can be used by the scientific community for comparison. Data Availability Statement: Publicly available data sets were analyzed in this study. These data sets can be found here: https://github.com/sc4t1m/scatimdata.

Conflicts of Interest:
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A
The following tables present the standard deviations (SD) for R² and MAPE for all data sets and targets. The standard deviation is an indicator of the stability of the prediction performance: each value is computed over the individual data splits in the context of the applied nested cross-validation (see Section 3.1).